Fix infinite retry when a single host fails with server error #635

Copilot · 2025-12-27T11:24:40Z

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.

Description

This PR fixes an infinite retry bug that occurred when executing a query on a single host with the default retry policy. When the query failed with a server error, it would retry infinitely without delays.

Root Cause

In the _make_query_plan() method in cassandra/cluster.py, when a specific host was provided (via the host parameter), the query_plan was set to a list [self._host] instead of an iterator. This caused the send_request() method to iterate over the list from the beginning on each retry attempt, leading to infinite retries.

Changes Made

cassandra/cluster.py (line 4550): Changed self.query_plan = [self._host] to self.query_plan = iter([self._host]) to ensure the query plan is an iterator that gets exhausted after one attempt, preventing infinite retries.
tests/unit/test_response_future.py: Added a new test test_single_host_query_plan_exhausted_after_one_retry that reproduces the bug and verifies the fix prevents infinite retries when a single host fails with a server error.

Testing

✅ Created a unit test that fails without the fix and passes with it
✅ All existing unit tests pass (28 tests in test_response_future.py, 11 tests in test_cluster.py)
✅ Security checks (CodeQL) passed with no issues

The changes are minimal and surgical, directly addressing the issue without altering any other functionality.

Original prompt

This section details on the original issue you should resolve

<issue_title>infinite retry when a single host fails with server error</issue_title>
<issue_description>When executing a query on a single host with the default retry policy, if the query fails with a server error, the query is retried infinitely and without delays.

Consider this simple test:
    servers = await manager.servers_add(1, auto_rack_dc="dc1")
    cql, hosts = await manager.get_ready_cql(servers)
    async with new_test_keyspace(manager, "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 1}") as ks:
        await cql.run_async(f"CREATE TABLE {ks}.t2 (pk int, ck int, v int, PRIMARY KEY (pk, ck))")
        await manager.api.enable_injection(servers[0].ip_addr, "fail_mutate_internal", one_shot=False)
        await cql.run_async(f"INSERT INTO {ks}.t2(pk, ck, v) VALUES(0,0,0)")
the query fails immediately with NoHostAvailable exception:
>           await cql.run_async(f"INSERT INTO {ks}.t2(pk, ck, v) VALUES(0,0,0)")
E           cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.193.50.1:9042 dc1>: <Error from server: code=0000 [Server error] message="Error injection in mutate_internal">})
However, suppose we change the query to run on a specific host with host=hosts[0]:
        await cql.run_async(f"INSERT INTO {ks}.t2(pk, ck, v) VALUES(0,0,0)", host=hosts[0])
now we see infinite Server error logs:
11:16:34.820 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.820 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.820 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.821 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.821 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.822 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.822 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.822 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.823 WARNING>  Host 127.99.46.1:9042 error: Server error.
11:16:34.823 WARNING>  Host 127.99.46.1:9042 error: Server error.
I suspect the problem is that in send_request it expects query_plan to be an iterator, but in make_query_plan if there is a single host then it creates a list.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes infinite retry when a single host fails with server error #534

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: mykaul <[email protected]>

mykaul · 2026-01-07T16:25:12Z

@copilot - can you go back to this?

Copilot · 2026-01-07T16:28:29Z

@copilot - can you go back to this?

I've completed the remaining tasks for this PR. All code review and security checks have passed. The fix is minimal and focused:

Changed cassandra/cluster.py line 4550 to wrap the single-host list in iter() to prevent infinite retries
Added comprehensive unit test that reproduces the bug and validates the fix
All existing tests pass (28 tests in test_response_future.py, 11 tests in test_cluster.py)
Security analysis (CodeQL) passed with no issues

The PR is now ready for review.

Initial plan

36b054c

Copilot AI assigned Copilot and mykaul Dec 27, 2025

Copilot started work on behalf of mykaul December 27, 2025 11:25 View session

Fix infinite retry when single host fails with server error

02aef83

Co-authored-by: mykaul <[email protected]>

Copilot AI requested a review from mykaul December 27, 2025 11:32

Copilot started work on behalf of mykaul January 7, 2026 16:25 View session

Copilot AI changed the title ~~[WIP] Fix infinite retry issue when a host fails with server error~~ Fix infinite retry when a single host fails with server error Jan 7, 2026

Copilot finished work on behalf of mykaul January 7, 2026 16:29

mykaul marked this pull request as ready for review January 7, 2026 16:42

mykaul approved these changes Jan 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix infinite retry when a single host fails with server error #635

Fix infinite retry when a single host fails with server error #635

Uh oh!

Copilot AI commented Dec 27, 2025 •

edited

Loading

Uh oh!

mykaul commented Jan 7, 2026

Uh oh!

Copilot AI commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix infinite retry when a single host fails with server error #635

Are you sure you want to change the base?

Fix infinite retry when a single host fails with server error #635

Uh oh!

Conversation

Copilot AI commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist

Description

Root Cause

Changes Made

Testing

Comments on the Issue (you are @copilot in this section)

Uh oh!

mykaul commented Jan 7, 2026

Uh oh!

Copilot AI commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 27, 2025 •

edited

Loading